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OVERVIEW OF PROJECT 



. . The "fveloping "proficiency" movement within the U.S. language teachino 

lea hinTo P art , Stifflulated the efforts of the American'co nc e 

oov^L? , 190 t angUa9eS (ACTFU and * ith the assistance of seve a? 
government language training agencies under the auspices of tha Interaaencv 

-iJISrSffi a preffliuffl 0n e e 

■III ?I " al anouaQe skills . especially listening comprehens on 

anoV speaking, within a real-life language use context. Face-to-f acl» I„Ii! I 

trl ^ D e inL teStS T ACTFL/ILR ^ (i ' e " "Hve^interW «s c Id " a 
trained i ntervi ewer /rater and scored on the basis of the ACTFL/ILR verbal 

descriptive guidelines) have been quite widely implemented with n the laJo.r 

" iJin 'r^ 5 SUC V S FrenC h ' Spani5h ' "- Serial by meal S tisfe - 9er " 
training workshops and associated testing networks. However for » „< f h » 
less-commonly-taught language programs in the United S a 't , , d * 
organic lonal realities, at least for the present and near-term "ure are 

ra ned n° t ^ eClUde th % devel 0P-nt deployment of sufficl a 'o 
traned interviewers and raters to adequately meet the speaking testing needc 
at issue in the adoption of a "proficiency" approach to language instruction^ 

availIb^ m a aJ f 0 In 0b K eCt i Ve °! the re P° rted P r °Ject Has to develop and make 
available a tape-based, alternative approach to interview-based soeakinn 
proficiency testing that would be economically via use witn fa Juan« 

loselv r I^i V : ly = tudent volumes, but thlt at tL ^ ti 1 ? 3 
closely modeled on, and readily i nterpretabl e in terms of, the ACTFL/ILR 

uc° l a l i: C stIIr e dir?* I oac " mPliSh thiS 90al ' W al ter n ate forms of 
such attest *ere developed in Chinese and validated on a representative stnrfpnf 
population L,,r,ugh direct statistical comparison with the r u a A FL / R 
interview and rating procedure. Validation results showed a substant ,1 
correspondence between student scores on the tape-based ss 1 suits 

tiled til 1 nterVI6H ' P r ^iding considerable confidence in the value o the 
J?fllf- \ " a P pro P ria te and effective alternative to the live interview in 
situations where use of the the laffer *ae nn f x,„ • ,i , interview in 

fM.ihio Th- - / -j , latter was not financially or administratively 

feasible. The second, closely related objective was to develop an 

a ess en ° r9ani2at ; 0nS ' ° r groups considering the develo i milar . 

assessment procedures for other 1 ess-cowon 1 y-taught languages Since wl n, 

w lid" r 0 Hr ti0 r C ° nCerning Ust d -elopment and aSmin at 1 1 pro edlrl" ha 
»! uSpS < llV , a PP ear In a final Project report is provided in thi hand oo 
c D nrlnf r ed f f0r referenCe at the end of this report), the following pages wi 1 1 
concentrate on an overview description of major project activities and fl 

fne^rLT 1 S J a ; i f tiCal ^ °< ™ teSt -Udiuon \ 'p j ct 

The reader is asked to refer to the handbook section for more detailed 

eIt U h S Lv° f t6 H St rati ° nale 3nd t6St development, as well o hinese 
test booklets and scripts themselves, which are reproduced in Appendix A 

MAJOR PROJECT ACTIVITIES 

Project Working Committ e Meeting/Initial Test Planning 

The day-to-day work of the project was conducted primarily at the Center 
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for Applied Linguistics (CAD by the project director, John L. D. Clark, 
working with other CAL staff and in close coordination with the project co- 
director, Dr. Ying-che Li (University of Hawaii). Project planning, review of 
materials, and expert consultation throughout the project period was provided 
by a Working Committee consisting of, in addition to the project director and 
co-director; 

Dr. Albert Dien (Stanford University) 

Dr. Shang-Hsien Ho (University of Hawaii) 

Dr. Timothy Light (Ohio State University) 

Dr. Eugene Liu (University of Pennsylvania) 

Dr. Pardee Lowe (CIA Language School) 

Dr. A. Ronald Walton (University of Maryland) 

k- / i.*tl°[ P lannin 9 aeeting of the committee was held on August 3-5, 1985, in 
which both the proposed format and question types for the test were developed, 
subject to possible modifications on the basis of clinical tryouts of a draft 
form of the test. The test as initially designed consisted of five separate' 
sections, as follows: 

Personal conversation - Student listens to conversational questions in 
Chinese and responds to each question as it is asked. 

Single picture descriptions - Examinee looks at detailed line drawings 
and answers questions about them. 

Picture sequences - Examinee describes a series of events in a narrative 
fashion, based on a sequence of 3-5 line drawings. 

of nrin? 1 H S !r C ? e V iSC T Se " Re 1 a t i ve 1 y 1 on g er discourse is elicited by means 
of printed English questions to which the examinee replies in Chinese. 

Situations - Examinee reads a printed description of a real-life situation 
in which a specified interlocutor and communicative task are identified 
Examinee is to carry out the indicated task. 



Draft Test Administration 

Over November 1984 - January 1985, a preliminary version of the test based, 
on the above content specifications was administered to a total of 27 students 
? , ;;" e .^ ' ive institutions: Cornell University , Defense Language 
Institute (Monterey), University of Hawaii, University of Pennsylvania, and 
Stanford University. Based on this tryout, the overall format and content of 
the test were generally confirmed, with relatively minor modifications 
suggested e.g., some shortening of the English directions; moderate increase 
in time allotted for student response to certain questions, etc.). 

Preparation of Final Test Forms/Validation Administration 

. On the basis of the information obtained during the trial administration, 
four separate final forms of the test were developed, each using similar 

^rl a n! til ? U " ti0 " \l pes havin 9 different topical content. To validate 
each of the tests, both as (1) constituting essentially interchangeable 
versions of the test (i . e. , producting similar examinee results independently 
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of the particular form administered) and (2) producing the same scoring result 
!! e V e9 r lar . liVe " intervi6W f ° r 9iven examinee, an administa o 
aVJllllT'S" " hich , e «h «,f 32 students took two forms of the pro ect 
developed test, as Hell as a face-to-face interview pr 0ject 

Based on test administrations conducted at Brigham Young University and at 

(3 per student--one for each of two of the project-developed tests and one of 
the live interview) W ere independently scored by each of two certified ACTFL 

er'rLT ra ? er h S ?, a Pr °" dUre all °" ed *° r the -etermina be 'the 

inter-rater reliability of the project tests and the extent of correlation 
betneen^the project tests and the scores assigned on the basis oi the ace-to- 
face^intervieH. These results are described and discussed in the following 

Validation Study and Results 

As indicated above, the test validation paradigm used in this study 

^cp S fn V the * dministrati0n of a hi 9 hlv face-valid criterion instrument 
(face-to-face interview using ACTFL-tr ai ned interviewer/raters) and two forms 
of the experimental semi-direct test to each of 32 native Eng 1 i sh-speaki no 

ii;.;r.n! ! f i h i; B r , f con r i : t - n ! °v 6 students °*>in... at 9 ^ un?" y „* 

Hawaii and 16 students at Brigham Young University. At both institutions 
participating students were draw) from among a volunteer group expressina' 
interest in the project, with final selection made (on the basis of 
instructors' general familiarity with their speaking performance) so as to 
provide a wide and, to the extent possible, rectangular distribution of 
Yll Al^ll S - ! 3 ' h P artici P ati "9 student received a small honorarium of 

s ec a i XIo""?: 1 " l H /2 . total hour.. of testing involved. The Chinese 
specialists responsible for administering and rating the live interviews as 
well as for listening to and rating the student response tapes for eacToJ t.e 
semi-direct tests were Dr. Richard Chi of Brigham Young University \ni\r 
Shang-Hsien Ho of the University of Hawaii, both ACTFL-certif ed 
interviewer/raters in Chinese. 

. for all students at both BYU and Hawaii, the live interview was 
administered first, followed by two forms (out of the tota 0 forms, of 

4« .! a f e " adfflI " I5tered semi-direct test. Designation of the two t 
°™ I-!!, e . adfflini ftered to a given student, as well as the order "In S S 



. mm " 77 .'V"' ■ y i ven siuoent, as well as the order in which the 

o ;onf er f adfflIni5tered ' " aS ° n the basis of a Latin design ^ served' 

" tr ft 01 Possible sequence effects in test form administra on To 



procedure r.nd/or. rating assigned, Dr. Chi conducted each of the nte views of 

: f s:^i a :::"SJ%K; H ^- 1 - 1 ; • tu - d " n - t - and ° r - h ° « ut each of °< 

Except in 2-3 instances in which scheduling difficulties mandated the 
"Sli'H-^-S.! i f Ven . S J Uden * * ver * two-day period, all tests (live interview 
plus two taped-test forms) were administered within a single day. Informal 
conversation with the students following test administration indicated "hit 
they did not consider the amount of testing (approximately 15-35 minutes for 

ati uinn 1 In V lH e : and " "f" uteS each <°r the taped tests) unduly onerous or 
fatiguing. In addition to the cassette recordings of the taped test/audio 
recordings of the face-to-face interview were also made, using 1 e 
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fit:?; ::w dual students or re " iiecti - °< ^^^iijc'^rts: Ks >B,t,oB 

Has donjon !' . 5°'" • *J - "T - intervie " and tape-based semi-direct tests 
Jo"!!;; " 3 13_P01nt SCale "fining both ACTFL and ILR rating scales as 

ACTFL/ILR Level Coded As: 



Novice-Low 01 
Novice-Mid 02 
Novice-High 0J 

04 
05 
06 
07 
08 
09 
10 
11 
12 
13 



Intermedi ate-Low 

Intermedi ate-Mid 

Intermediate-High 

Advanced 

Advanced-Plus 

Level 3 

Level 3+ 

Level 4 

Level 4+ 

Level 5 



The several tables below provide descriptive statistics, interrater 



ratings assigned by each of the tio raters Dr Chi ! atistics < or the 

Dr. Ho (Rater 2) . to student ! r L ra ters, Dr. Chi (hereafter, Rater 1) and 
and on the five" interview performances °" "ch of the se.i-direct test for.s 



Table 1 



Descriptive Statistics f or Scoring Levels Assigned, Taped 
Test Form 



Rater 1 



and Live Tests 
Rater 2 





RANGE 




A (N=16) 


4-11 


4-11 


B (N=15) 


4-11 


4-11 


w IN- 16 J 


5-11 


4-10 


D (N=16) 


Oil 


4-10 


Interview (N=32) 


4-11 


4-12 




MEAN 




A 


8.0 


6.9 


B 


7.8 


6.9 


C 


7.6 


6.6 


D 


7.3 


6.5 


Inter vi ew 


7.7 


7.3 




MEDIAN/MODE 




A 


8/8 


7/7 


B 


8/8 


7/7 


C 


8/5,8 (bimodal) 


7/7 



Interview 



8/8 
8/8 



6.5/4 

7/7 



STANDARD DEVIATION 
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1.9 


1 9 

1.7 


2.0 


1.8 


1.8 


2.0 


1.9 


2.0 



8 
C 
D 

Intervi ew 

r^ifnnc 1 !'' r e 1 1 * b * 1 1 * 1 es ^"rson p r oduc t -momen t correlations) between the 
ratings assigned by Rater 1 and those assigned by Rater 2 for each of the 
semi-direct test forms and for the live interview are shown in Table 2 bel 

Table 2 
Interrater Reliabilities 



ow. 



Test Form 
A 
B 
C 
D 

Intervi ew 



Correlati on 
.89 
.96 
.93 
.91 
.88 



foris re witn f r h Bliabiliti f s <° r the sa.e student taking two different test 
for.s, with the sa.e rater scoring both for.s, are shown in Table 3. 



Table 3 

Test-Retest Reliabilities (Same Rater) 



Tests Taken By Student Rat er 1 Rater 2 

Forms A and B pc . „„ 

■ 7u .99 

Forms C and D ge 

■▼5 . .93 



Table 4 

-Retest Reliabilities (Different Foras and Raters) 



Rater/Form Combination 



Correl ation 



Rater 1/Form A 
Rater 1/Form B 
Rater 1/Form C 
Rater 1/Form D 



Rater 2/Form B 
Rater 2/Form A 
Rater 2/Form D 
Rater 2/Form C 



.90 
.94 
.91 
.91 



Rater/Form 

Rater l/Fora.A 

Rater 1/Form B 

Rater 1 /Form C 

Rater 1/Form D 



Table 5 

Correlations with Live Interview 



Inter, as Scored by Rater 1 Inter, as Scored by Rater 2 



.98 
.97 
.96 
.97 



.86 
.91 
.90 
.89 



Rater 2/Form A 
Rater 2/Form B 
Rater 2/Form C 
Rater 2/Form D 



.90 
.93 
.92 
.91 



.98 
.97 
.92 
.92 



Interrater reliability of the live interview scoring was .88. (Test-retest 
reliability information for the live interview is not available, since all 

nterview scoring was based on both raters listening to a single tape-recorded 
interview for any given student.) 

../J 8 * Qeneral summary of the statistical information above, it may be 
stated that all four forms of the experimental semi-direct test reveal high 



ar fh! 11 U f S ' H i th PearSOn P roduc t-«0'nent correlations uniformly 

90 ^"mS „° r hl9her ' Test-retest reliabilities are also in he 

.90 and^higher range under the most "severe" conditions (i.e., different rater* 
rating two different forms, and are even higher (mid-Jos! for test-r'test 
o?^ 1 !^ i i nVOl l i 1 q J he saffle rater ' Test validity coefficients (correlations 
are >l*l llrt V k teStSa 9 a Jnit the live interview as an external criterion) 
Sw«!i2 iV'i h 9 h ', ranging fro. .86 to .98, with a mean value of .93 across 16 
different test f om/rater/interview rater combinations. 

r 0 „ ei If eS f e f 0rrelati ? n . reSUltS apP " r t0 indicat e that there is a strong and 
di?«Z J!ifi , -!" r ^ atl0n8hi P a * on 9 sets °f assigned scores on the four semi- 
intlrrJ!?! or h a ,?J ve " 9 r °up of students, from the three basic standpoints of 
xnterrater reliability, test-retest ("alternate form") reliability, and 
correlation with an external more highly face- and content-valid criterion 

I!I!X , ;«-X!r , - r \ ,n additi k on t0 reliability coefficients per se, a second 
aspect of test performance which requires examination is the extent to which 
the absolute values of the assigned ratings remain the same across different 
Ittlnl "h 1 / 0 " 5 ' Alter " ati ^ it is necessary to determine the 

scor^l nl r J nY 9 i r e " k SMi -«'i''ect test form, students will receive similar 
scores regardless of the particular rater evaluating that test; 

recpivp'fh!^ 6 " 15 " h ° re " iVe 3 9iVe " SC ° re ° n 0ne fora oi the test will 

by tnl "I 0 "' ° n " Ch ° f the ° ther teSt f0r " 5 ' "".thtr these are rated 
Dy the original rater or some other rater; 

dfr. e t C i.5 , lf?t a studer, t k rated at * given level on the basis of the semi- 
direct test will receive the same rating on a live face-to-face interview! 

Tables 6-33 show the two-way crosstabul ati ons of raters, semi-direct test 
scores, and interview scores that address these three questions 

Illicit I % ?! " St Sll9ht 9 eneros ity in rating on the part of Rater 1 bv 
comparison to the scores assigned by Rater 2-.. tendency which is in evident 
across all four forms of the test. For the most part" the ma n tu e of the 

intermediate-High, )~a degree of variation that would be characterized a « 

in aJc^ P h U r d ?f°; nt ° n r69Ular ^TFL/ILR scale. However e " ' 

nstances, the difference is a full level, and in one case (Table 6), two full 

hot S ' f" T o arent , . , ° ne - tifl,e " an °-ly considering the rating t V 
whole. For comparison purposes, crosstabs for interrater reliability for th» 

ive interview scores (Table 34, show a similar clear pattern of" 
proportionately higher scores on the part of Rater 1. 

IHI"'^ 651 reliabilit V - Cross-tabs for different test forms as scored by the 
"prater are shown in Tables 10-13. Overall, there appears to be a strong 

e i-i? ect C te s rL 0n t; nCe ^T" tht " > >4 """ on alternate for of 
Form I « \clrL 11 Rff th " e k a - re being scored by the same rater. Form A vs. 
Form B as scored by Rater 1 shows only occasional "plus-point" variations wh rh 
are not consistently in either direction; for Form C vs. Vorllt Voir o \Zl fl 
data pairs show plus-point generosity in'favor of Form C, w Hh one f 1 -1 el 
difference in the same direction observed. Rater 2 give virtua ly i en ica 
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Table 6 



X-Axis: 

« ha i a; 


Rater 
Rater 


1 

2 


- Form 

- Form 


A 
8 


























(4) 


(5) 


(6) 


(7) 




(8) 




(9) 




(10) 


(11) 


Inter-Low 


(4) 


I 


2 I 




I 


I 


I 


1 


I 




I 




A — — - 

I 


Inter-Hid 


(5) 


i 


I 




I 2 


| 


i 




I 




I 




I 


Inter-High (6) 


i 


I 




I 




I 




I 




I 




I 


Adv 


(7) 


i 


I 




I 


I 1 


I 


4 


I 


2 


I 




t 


Adv-Plus 


(8) 


i 


I 




I 




I 




I 




I 




I 


Level 3 


(?) 


i 


I 




I 




I 




I 




I 




I 1 


Level 3+ 


(10) 


i 


I 




I 




I 




I 




I 


1 


I 1 


Level 4 


(11) 


i 


I 




I 




I 




I 




I 




I 1 





-I 
I 
I 
I 
I 
I 

■I 
I 
•I 
I 

•I 
I 

■I 
I 
I 
I 

•I 



X-Axis: Rater 1 - Form B 
Y-Axis: Rater 2 - Form B 



Table 7 



(4) (5) ( 6 ) (7) 



(8) 



1 nter-Low 


(4) 


1 J 


1 


1 1 






Inter-Mid 


(5) 






1111 






Inter-High (6) 






1 J 1 








(7) 1 






1 


1 6 1 


11 | 


Adv-PI us 


(8) 1 






1 1 






Level 3 


(9) 1 






1 1 




1 1 1 




(10) 1 






1 1 




1 111 


Level 4 


(11) 1 






1 1 




1 111 





X-Axis: Rater 1 - Form C 
Y-Axis: Rater 2 - Form C 
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Table 8 



(4) (5) (6) (7) (8) (9) (10) (11) 



Inter-Low (4) 


1 1 2 


111 1 




! ! 


Inter-Mid (5) 


1 1 2 


1 1 1 




! ! " 


Inter-High (6) 




1 J 1 .11 






Adv (7) 




1 1 13 


1 2 




Adv-Plus (8) 








i i i 


Level 3 (9) 






1 1 


i i i 


Level 3+ (10) 








i i i 


Level 4(li)| f l| | 


X-Axis: Rater 1 


- Form D 


Table 9 






Y-Axis: Rater 2 


- Form D 










(4) (5) 


(6) (7) (8) 


(9) 


(10) (11) 


Inter-Low (4) | 


1 4 I 








Inter-Mid (5) | 


1 J 1 


1 i i i 






Inter-High (6) 1 




1 12 1 






Adv (7) | 




' 12 1 






Adv-Plus (8) | 




1 13 1 






Level 3 (9) I 






1 1 


1 1 1 


Level 3+ (10) | 






1 i 


1 1 


Leve 14(H) | 








1 1 





X-Axis: Rater 1 
Y-Axis: Rater 1 



- Form A 

- Form B 



Table 10 



C4) (5) (6) (7) (8) (9) (10) 



(11) 



Inter-Low (4) | i | | j ( ! J 1 




1 1 1 














Inter-High (6) 




1 2 




















1 1 














1 1 


I 4 


1 1 






Level 3 (9) 










1 1 


































1 1 




• 

X-Axis: Rater 1 


- Form C 
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Y-Axis: Rater 1 


- Form D 
















(4) (5) 


(6) 


(7) 


(8) 


(9) 


(10) 




Inter-Low (4) I | | ( , i 1 


Inter-Mid (5) I 


1 4 1 


1 












Inter-High (6) 1 




J 












Adv (7) 1 1 | | | j J 1 


Adv-Plus (8) I 








4 1 


2 


1 




Level 3 (9) | 










1 1 


1 




Level 3+ (10) 1 1 1 I i — ' 


Level 4 (11) | 














1 1 
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Table 12 

X-Axis: Rater 2 - Form A 
Y-Axis: Rater 2 - Form B 



(4) (5) (6) (7) (8) (9) (10) (11) 



1 nter-Low 


(4) 


1 2 




I I 


i ! ! 


Inter-Mid 


(5) 


1 1 


1 J 1 


I I 


i ! ! 


Inter-High 


(6) 




1 I 


! ! 




Adv 


(7) 






17 1 




Adv-Plus 


(8) 1 










Level 3 


(9) 1 








i j i i 


Level 3+ 


(10) 1 








i i j i 


Level 4 


(11) 1 








i i iii 



Table 13 

Rater 2 - Form C 
Rater 2 - Form D 



(4 > ( 5 ) (6) (7) (8) (9) (10) (11) 



Inter-Low (4) 


1 2 1 


o 










Inter-Mid (5) 


1 11 




1 








Inter-High (6) 






J 


1 1 






Adv (7) I 














Adv-Plus (8) 1 








1 2 1 


1 




Level 3 (9) 1 












1 J 1 1 1 


Level 3+ (10) 1 












111 1 


Level 4 (11) 1 | | | ( , 



X-Ax i s : 
Y-Axis: 
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X-Axis: Rater 1 
Y-Axis: Rater 2 



- Form A 

- Form B 



13 

Table 14 
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X-Axis: Rater 1 - Form. B 
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Table 16 

X-Axis; Rater 1 - Form C 
Y-Axis: Rater 2 - Form D 



< 4 > (5) (6) (7) (8) (9) (10) (11) 



Inter-Low 


(4) 1 14 










Inter-Mid 


(5) 1 1 | 
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1 1 1 
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1 1 J 1 


1 


1 




Level 3 


(9) 1 | | 






1 


1 


Level 3+ 


(10) l| | 




1 1 






Level 4 


(11) 1 1 1 










X-Ax i s : 


Rater 1 - Form D 


Table 17 









Y-Axis: Rater 2 - Form C 



(4) (5) (6) (7) (8) (9) (10) (11) 
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1 J 






Level 3 
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X-Axis: 


Rater 1 
Kaier 1 


- Form A . 

- Interview Score 
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Table 19 

X-Axis: Rater 1 - Form B 

Y-Axis: Rater 1 - Interview Score 
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Table 20 



X-Axis: Rater 1 
Y-Axis: Rater 1 



Form C 

Interview Score 



(4) (5) (6) (7) (8) (9) (10) (11) 



Inter-Low (4) 


1 1 1 l 






Inter-Mid (5) 


1 1 3 l 






Inter-High (6) 


1 1 1 2 






Adv (7) 


1 1 | 


1 1 1 




Adv-Plus (8) 


1 1 1 


1 13 


12 1 | 


Level 3 (9) 


1 1 1 




1 J 1 1 1 


Level 3+ (10) 


1 1 1 


1 1 1 


1 1 11 


Level 4 . (ii) i | | | ( , [""" 


X-Axis: Rater 


1 - Form D 


Table 21 




Y-Axis: Rater 


1 - Interview Score 
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Table 22 
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X-Axis: 


Rater 2 


- Form A 








Y-Ax i s : 


Rater 2 


- Interview Score 
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Table 23 




2 




X-Axis: 


Rater 2 


- Form B 








Y-Axis: 
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- Interview Score 
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Table 24 



X-Axis: 
Y-Ax i s : 


Rater 2 
Rater 2 


- Form C 
Interview Score 
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J 1 






Level 3 
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1 1 1 
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X-Axis; Rater 2 - Form D 

Y-Axis; Rater a - Interview Score 



Table 25 
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2 








Adv-Plus 


(0) 1 










3 




Level 3 


(9) 1 












1 1 1 | 
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2 1 1 | 
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Table 26 



X-Axis: Rater 1 
Y-Axis: Rater 2 



Form A 

Interview Score 



(4V (5) (6) (7) (8) (9) (10) (H) 



1 nter-Low 


(4) 










(5) 


1 2 I 


i i ii 




Inter-High (6) 




12 1 | 






(7) 1 




1 1 14 


1 1 1 




(8) 1 




1 111 | 


11 1 1 




(9) 1 
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(11) 1 
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X-Axis: 


Rater 1 


- Form 8 










Y-Axi s: 


Rater 2 


- Interview 


Score 
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Table 28 

X-Axis: Rater 1 - Form C 



Y-Axis: Rater 2 


- Interview Score 














(4) (5) (6) 


(7) 


(ft) 




\\\)) 


/ i 4 \ 


Inter-Low (4) 1 
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1 111 












Inter-High (6) 1 
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Adv-Plus (8) 1 






1 I 
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i 1 




Level 3 (9) 1 








1 






Level 3+ (10) 1 










1 1 


I 1 


Level 4 (11) | | | ( l . . I — 


X-Axis: Rater 1 
Y-Axis: Rater Z 


- Form D 

- Interview Score 


Table 
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111 | 














1 J 1 1 I 














13 1 | 
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4 I 














3 1 
















1 1 














1 I 









Table 30 

X-Ax i g : Rater 2 - Form A 

Y-Axis: Rater 1 - interview Score 



(4) (5) 


(6) (7) (8) 


(9) 


(10) 


(11) 


Inter-Low (4) | 


2 1 










Inter-Mid (5) I | | 1 I II I 


Inter-High (6) | 


1 1 










Adv (7) | 


1 1 










Adv-Plus (8) | 


1 1 


1 16 1 | 








Level 3 (9) I 




1 111 | 








Level 3+ (10) 1 | | | | j ( , 


Level 4(11) | 






1 


1 2 


1 1 


X-Axis: Rater 2 - 
Y-Axis: Rater 1 - 


Form B 
Interview 


Table 31 

Score 








(4) (5) 


(6) (7) (8) 


(9) 


(10) 


(11) 


Inter-Low (4) . | 2 I | | | , 


Inter-Mid (5) | | III!!! 


Inter-High (6) | 


1 1 1 










Adv (7) | 




111 | 








Adv-Plus (8) | 


1 1 1 


16 1 | 








Level 3 (9) | 




111 | 








Level 3+ (10) | 












Level 4 (11) 1 






1 1 


1 I 


J 1 



X-Axis: Rater 2 
Y-Axis: Rater 1 



Form C 

Interview Score 



rable 32 
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| 1 | 
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Table 33 



X-Axis: Rater 2 - Form D 

Y-Axis: Rater 1 - Interview Score 



(4) 



(5) (A) 

1 

I 

1 

I 

1 

2 I 

1 

I 

1 

I 2 

— — I 

I 

I 

1 

I 

— -I 
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(B » <9) (10) (11) 



Inter-Low (4) 
Inter-Hid (5) 
Inter-High (A) 
Adv (7) 
Adv-Plus (8) 
Level 3 (9) 
Level 3+ ( 10) 
LeveL 4 (11) 



1 
2 



24 



23 

Table 34 



X-Axis: Rater 1 
Y-Axis: Rater 2 



Interview Score 
Interview Score 



(4) (5) (6) (7) (8) (9) (10) (11) 



Inter-Low 


(4) 




I . 1 
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1 1 




1 1 






Inter-High (6) 




2 


1 2 1 
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1 8 


1 1 
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(8) 1 








1 


1 4 
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(9) 1 












1 
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(10) 1 












1 


1 1 1 1 1 


Level 4 


(11) 1 














1 12 1 


Level 4+ 


(12) 1 














1 1 1 
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lormrc'andT^n H- d r B ' ° nly tw ° P lus "P oint discrepancies in total. For 

n ! 11 discrepancies are within a "plus" point, but in a areater 

test-retest scores were^se ' oth^acersT {hlTOr^i ■l" 4 " 1 ^ 

in'EIe r ilable (A/B and C/D '- "'thT^ge „ e of IZ c" 'Scores 
jn.th. two instances and with practically no discrepancy greater Jnan a "["v 

as;in !;; e 0uld be expected, there is somewhat greater inter-form variation in 
assigned scores when different raters, as well as different forms are 
involved As indicated rather clearly in Tables 14-17, cor s a si ed by 
Rater are almost always higher than those assigned by Rater 2,1 tendency ^ 

C '117 de ! n e al !K 0f the t6St ~»ns invo v d A 

?ev A el C 0 fffe D ;e C :ce ?! ^rin^n^^ 1 "^ °< ^"-pancy reflects a JuU- 

" tB^nH-nf^ 11 ^^ * Tabl6S l8 " 33 Sh0W "despondences between scores assigned 
to students on the semi-direct tests by co.parision to those given " s 

s e St' D e : VIeH ' These results appear very similar to t t ned for 

lilrJ* h co *P arislons evolving alternate forms of the semi-direct test as 

R»Lr ? f I L hB ! a<ne rat6r tables 18-21 for Rater 1; Tables 22-25 for 
Rater 2), the obtained pairs of scores are either identical or Went fir 

re™" ^^V' °" the part of Rat"r " e * b ° o 

i , P US PO»nt in either direction. However, when different raters 

scor no H * 6lther ^ evaluati "9 the semi-direct tesi or h e f 

scoring differences are much more appreciable, with Rater 1 clearly more 

°; s than his colleague, regardless of whether Rater 1 is eva uatino the 
^Taoles 3^-33?. ° # * Mi - direct test (Tables 26-29, or^thi fnUr'vfew : 

by thrtwo-r^L 0 '^" 6 ?Tl " be suggested that, to the extent warranted 
nIJ h M \ four -test-form comparisions available in this study, it is 

a'sefin h ,? * high level of congruence of the absolute values of 

9 h " Sou: j^^w 

and 1i vp i nli the seB,I " dlrect test and between the semi-direct test 

o^dTTfir^tes 6 0 ! " r e ; t rat ? are involved, either scoring the same 
or uitterent test forms or the live interview vs. the taped test an 

s or PS ia ^ 19her inCiden " °< di " ere "<« in the absofute a es , the 
scores is shown, even though the linear correlations themselves remain high. 

npnHil!!^ re J ard t° the practical applications of the semi-direct test, and 

e t a d ?h r ! ^ eXam ; nati ° n °' the scoring reliability of both t e s - rect 
test and the live interview in a number of other contexts involving a variety 

^tentaUve^ 

nrn . ( i n , Hold ! n 9 the rater constant, different forms of the semi-direct test ^ 
provide largely equivalent scoring results on a test-retest blsls 

results'tlat Irf ^rnl? 1 teSt ^^f ^ve loped i n this study provide 

results that are largely equivalent to those obtained in the live interview, 
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again holding the rater constant for both types of test. 

««r. iffh 0 " 6 d !" repanc * " a M« anticipated in the scoring results for one 
for, of the semi-direct test vs. another for.,, or for semi-direct test vs live 
interview results, when different raters are used to provide these data 
The observed^discrepancies do not appear to be attributable to the format or 
ItZ fi arac *"W." .»« the se.i-direct test as such, but also occur wih 

-•'mSJ.Svij:^" --9 !!*!;;^-;' M#, sv n the scoring °< the iive * ^ 

effectiveJe« of thl rll V** °* teStin9 ' Cl ° Se attenti °" to the nature and 
V.lJl m 6 rater trainin 9 Process as related to the participants' 

STUDENT FEEDBACK ON SEMI-DIRECT TESTING 

a i BB rf! e J5 a Jl! infor * ation fro « the participating students concerning various 
aspects of their experience with and opinions about the seau -direct testing 
procedure Here elicited by .eans of a short questionnaire (Appe dix B* which 

coin? rj ISt r ed . after b ° th the Hve 3nd ««.i-direct testing^ad been 
All Jl J' T " nt y- seven °rthe total of 32 participants (847.) returned a 
completed questionnaire, the results of which are summarized below. 

. . The iir l l tH0 questions asked for a student comparison between the live 

•nd-.j.i-dir.ct formats in terms of the extent to which each o these 
testing approaches had succeeded in eliciting the highest level of language 
performance of which they were capable. The two questions read as foUowsf 

ip 0 e V akino e ab°Ht! in live.J^te^vieH , do you feel that your maximum level of 
speaking ability in Chinese was adequately probed by the tester?" 

n^f^ C T S ! ° f the ta P ed te5t ' d ° yoo feel that the descriptions, 
narrative situations, and other types of questions in the test were adeouate to 
probe your .aximun. level of speaking proficiency in Chinese" ' t0 

The generally comparable relative percentages of "yes" and "no" responses for 
each of these questions (Figures 1 and 2) suggest that the exa.i ee for tie 

"o h rHp„Jr d H n ?K diffe [ enCe bBtHeen thB live and ta P ed ^r.ats with respect 

boe 1 d 4 "^' e^on? o n : SS ^ Speaki " 9 Performance had 

th! n f •!;■ Asecond P ai r 0 f questions asked for a similar comparison of - 
the overall "fairness" of the two testing approaches: P 

reQu^red ^hich^nn"; 6 :; '"^ " uestions asked ° r speaking situations 

required wnich you felt were in any way 'unfair'?" 

"In the taped test , were there any picture/descriptions, narratives, 
situations, or other questions that you felt were in any way 'u n <Vi r '7< 

Vsyl h «TJ n »/ n \l Ur * 5 3 4 ' u virtuaMy 00 students felt that they had been 
fJl J I , V ' " ue ft^ns by the live interviewer, while 30 percent felt 
that at least some portion of the taped test had included such questions 
Write-in comments indicated that, for the most part, students were refer^n- in 

4 0 r*i\c?$° particular ^"tions they had not been'able to d.aTwuJ pr[ °'<i 
for lack of proficiency, rather than as a result of intrinsic f u „ [ h ~~ ■ 
questions or testing procedures per ... However, wo sudesuggeed that 
the directions for; the series-of-pictures section should be revised to Vticlte 
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Figure 1 
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Figure 2 

ABILITY PROBED IN TAPED TEST? 
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Figure 3 



UNFAIR QUESTIONS IN LIVE INTERVIEW? 
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Figure 4 

UNFAIR QUESTIONS IN TAPED TEST? 
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Figure 5 



IN WHICH TEST MORE NERVOUS? 
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Figure 6 
WHICH TEST MORE DIFFICULT? 
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Figure 7 
TAPE PAUSES LONG ENOUGH? 
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Figure 8 



"ESI DIRECTIONS CLEAR? 
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Figure 9 

PREFER LIVE INTERVIEW OR TAPED? 
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more clearly that each of the pictures in the set should be addressed in a 
sequential manner. 

Figure 5 summarizes the responses to the question, "In which of the two 
n lrll lU 1 7 intervie " or ta P ed test-did you feel the .ore anxious or 
Zlrll^ A IF ? aj °^ ty (6 ' S%) Qf respondents felt that they had been mors 

hervous during the taped test, while equal numbers (197. in each case) were 
divided between considering both types of test equally nervousness-producing or 
attributing this characteristic predominantly to the live interview. 

Notwithstanding the essentially equivalent scores which they obtained 
under both the live and taped tests (scoring results were not communicated to 
the students until several days after questionnaire administration), the great 
majority (787.) considered the taped test "more difficult" than the live 
interview, with only 7 percent having the opposite opinion (Figure 6). 

With regard to certain technical aspects of the taped test, most of the 
respondents (56X) felt that the length of pauses provided on the tape was 

usually about right" for them to respond as fully as they desired or were 
able. Pauses were considered generally "too long" by 19 percent and "too 
short by 26 percent (Figure 7). A large majority (857.) considered the taped 
test directions "sufficiently clear and detailed," with only 12 percent of the 
contrary opinion (Figure 8). 

To the "bottom-line" question, "Assuming that you would receive the same 
score through both techniques, would you personally rather take a live 
interview or a (single) taped test in order to show your speaking 
proficiency?," examinee responses were overwhelmingly (897.) in favor of the 
live interview, with only 4 percent expressing a preference for the taped test 

The overall results of this brief survey of student opinions concerning 
the se.i-dirjct teiting procedure, both in its own right and by comparison to 
II i f *"~ t °- ,ace lnterv ie"ing. appear to suggest that while students view 
cf!„Hn Pe t ! J* " ? er,eraUy " e11 instructed and sufficiently probing from the 
standpoint of elicitaticn procedures, they feel it is more difficult than the 
live interview and tend to consider at least portions of the test as "unfair. - 
In a forced choice between the two types of testing, the great majority of 
examinees indicate a personal preference for undergoing a live interview rather 
than a tape recorded test. From an administrative viewpoint, implications of - 
ic „! i * n H, qU " tlc,nnaire data "°uld seem to be that face-to-face interviewing 
is preferable whenever the necessary resources can be made available, but that 
when an alternative approach is required, the students involved will generally 
consider themselves adequately tested through semi-direct means, albeit as a 
second choice" procedure. 

DISSEMINATION OF STUDY RESULTS 

Camera-ready copy for all four forms of the tests developed under this 

Jnr'^^/w r 611 " i aaSt ?n A , S J iBU i US tapeS « is Presently housed at the Center 
for Applied Linguistics (CAL). CAL intends to make copies of the test 
materials, as well as a test scoring service, available to the field on a cost- 
recovery f ee^basis^wi thin the near future. Copies of the test development 
handbook will also be available through CAL. H 
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